Method and apparatus for visual background subtraction with one or more preprocessing modules

ABSTRACT

Methods and apparatus are provided for visual background subtraction using one or more preprocessing modules. One or more effects are detected in a received image signal and one or more blocks are selectively enabled to preprocess the image signal to compensate for the detected one or more effects. Visual analysis is then performed on the preprocessed signal using background subtraction. A spatially-variant temporal smoothing of the image signal is also disclosed. The spatially-variant temporal smoothing can be achieved by the mixing of a new intensity value with a previous intensity time-average as determined by a weighting matrix. The mixing can be influenced by a dynamic bias term that is a real-time estimate of a variance at the pixel, such as a degree of change, and the weighting can be determined by a relative stability of an observed value compared to a stability of the time-average.

FIELD OF THE INVENTION

The present invention relates generally to imaging processingtechniques, and, more particularly, to techniques for visual backgroundsubtraction.

BACKGROUND OF THE INVENTION

Background subtraction is a popular technology for finding movingobjects in images of an environment. Unfortunately, there are numerousfactors that can adversely impact the efficacy of this class oftechniques. Such disturbances include changes in camera responses due toautomatic gain and color-balance corrections, image jitter due tovibration or wind, perceptually-masked artifacts due to videocompression or cabling inadequacies, and varying object size due to lensdistortion or imaging angle.

Some of these problems have simple solutions, but they are not optimal.While video can be transmitted and recorded in an uncompressed state,the required bandwidth and disk-storage space increases costssignificantly. Similarly, lens distortions can be remedied by purchasingbetter (albeit more expensive) optics. Although it is possible tocorrect imaging geometry, this is difficult to cope with in practicebecause it involves moving cameras to optimal viewing locations. Suchlocations may be inconvenient (e.g., requiring significantly longercable runs) or not feasible (e.g., above the ceiling level).

The solutions to other problems are not as straightforward. When thecamera shakes due to wind or other vibration, for example, the currentimage acquired by the camera will not exactly line-up with a previouslycaptured reference image. This leads to detection of image changes(particularly near edges or in textured regions) that are not due toindependent objects. Stabilizing the images produced by suchsurveillance cameras eliminates these artificial detections.

Stabilization can be accomplished by mechanically moving the camera inresponse to inertial measurements, or by altering portions of theoptical path (e.g., sliding prisms) in response to similar errorsignals. However these solutions require changing the cameras that arealready installed. Also, these solutions are typically bulkier than anordinary fixed camera and hence may be difficult to install in somelocations. Stabilization may also be performed electronically (as insome camcorders) by shifting the pixel read positions on a digital imagesensor. However, these pixel shifts are typically integer pixel shiftsthat are not accurate enough to remove all the artifacts generated bybackground subtraction. Another option is to use image warping based onoptical flow analysis. However, this analysis is mathematicallycomplicated thus necessitating either a lower video frame rate or a moreexpensive computation engine.

Many cameras have built-in circuitry or algorithms for automatic gaincontrol (AGC) and automatic white balance (AWB). These mechanismstypically generate video images that are more pleasing to the human eye.Unfortunately, these corrections can impair machine analysis of theimages because there are frame to frame variations that are not due toany true variation in the imaged environment. Background subtraction isparticularly affected by this phenomenon that can cause large portionsof the image to be falsely declared as foreground. Some cameras allowAGC and AWB to be disabled, however, this may not be true for all(possibly legacy) cameras in a video surveillance system. Also, it issometimes desired to analyze previously recorded material where thesource camera and its parameters can not be controlled retroactively.While it is possible to correct exposure and color balance usingtechniques such as histogram stretching or contrast stretching, thesewhole-image methods can be confused if the content of the scene changes.

Furthermore, when using legacy analog video transmission format RS-170,the color of a pixel is encoded as a phase-shifted chrominance signalriding on top of the standard amplitude modulated intensity signal.Unfortunately, when separating these two signals to reconstruct theimage representation, sharp changes in the intensity signal can beinterpreted as color shifts. This can happen due to inadequate bandlimiting of the intensity signal at the source, poor “comb” filtering atthe receiver, or nonlinear dispersion in the transmission medium(typically coax cable). This aliasing results in strobing color rainbowpatterns around sharp edges. This can be disadvantageous for computervision systems that need to know the true colors of regions, or forobject detection and tracking systems based on background subtractionwhich may erroneously interpret these color fluctuations as movingobjects.

The impact of these color artifacts can be diminished by converting theimage to monochrome (i.e., a black and white image) so that there are nocolor shifts, only smaller intensity variations. However, thisprocessing removes potentially valuable information from the image. Forinstance, in a surveillance system it is useful to be able to discernthe colors of different vehicles, something not possible in a gray-scalevideo. Another approach is to apply aggressive spatial smoothing to theimage so that the “proper” adjacent colors dominate in the problemareas. However, this approach is sub-optimal in that the boundaries ofobjects (and sometimes even their identities) can be obscured by suchblurring. Still another method would be to attempt to reconstruct theoriginal two-part analog signal and then employ a more sophisticatedchrominance-luminance separation filter. Unfortunately, many times videohas been subject to a lossy compression method, such as MPEG (especiallyif it has been digitally recorded), in which case the exact details ofthe original waveform cannot be recovered with sufficient fidelity topermit this re-processing.

A further problem is that video images often contain “noise” that isannoying to humans and can be even more detrimental to automatedanalysis systems. This noise comes primarily from three sources: imagernoise (e.g., pixel variations), channel noise (e.g., interference incabling), and compression noise (e.g., MPEG “mosquitoes”). Effectiveremoval or suppression of these types of noise leads to more pleasingvisuals and more accurate computer vision systems. One standard methodfor noise removal is spatial blurring, which replaces a pixel by aweighted sum of its neighbors. Unfortunately, this tends to wash outsharp edges and obscure region textures. Median-based filtering attemptsto preserve sharp edges, but still corrupts texture (which isinterpreted as noise) and leads to artificially “flat” looking images.Another method, temporal smoothing, uses a weighted sum of pixels frommultiple frames over time. This works well for largely stationaryimages, but moving objects often appear ghostly and leave trails behind.

Yet another difficulty is that background subtraction operates bycomparing the current image with a reference image and highlights anypixel changes. Unfortunately, while often the desired result is thedelineation of a number of physical objects, shadow regions aretypically also marked because the scene looks different here as well.Eliminating or suppressing shadow artifacts is desirable because itallows better tracking and classification of a detected object (i.e.,its forms varies less over time and does not depend on lightingconditions).

One way to eliminate shadows is to first perform basic backgroundsubtraction and then to more closely examine the pixels flagged asforeground. For example, the hue, saturation, and intensity can becomputed separately for the foreground pixel and the correspondingbackground pixel. If the hue and saturation measures are a close match,the intensities are then examined to see if they are within a plausiblerange of variations. If so, the pixel is declared a shadow artifact andremoved from the computed foreground mask. Unfortunately, this methodrequires the computation of hue, which is typically expensive because itinvolves trigonometric operators. Moreover, hue is unstable in regionsof low saturation or intensity (e.g., shadows). Finally, the derived hueis very sensitive to the noise in each color channel (the more noise,the less reliable the estimate).

A need therefore exists for improved techniques for visual backgroundsubtraction. A further need exists for methods and apparatus for visualbackground subtraction that address each of the above-identifiedproblems using one or more software preprocessing modules.

SUMMARY OF THE INVENTION

Generally, methods and apparatus are provided for visual backgroundsubtraction using one or more preprocessing modules. According to oneaspect of the invention, an image signal that has undergone previouscorruption by one or more effects is processed. The one or more effectsin the received image signal are detected and one or more blocks areselectively enabled to preprocess the image signal to compensate for thedetected one or more effects. Thereafter, visual analysis, such asidentifying one or more objects in the preprocessed image signal, isperformed on the preprocessed signal using background subtraction.

The one or more blocks may selectively perform one or more of a jittercorrection on the image signal, a color correction on the image signal,a contrast enhancement on the image signal, a cable-induced visualartifact reduction on the image signal, a spatially-variant temporalsmoothing on the image signal, and a lens geometry normalization on theimage signal.

According to another aspect of the invention, a spatially-varianttemporal smoothing is performed on the image signal. Thereafter, theprocessed image is presented for visual analysis. The spatially-varianttemporal smoothing can be achieved by the mixing of a new intensityvalue with a previous intensity time-average as determined by aweighting matrix. The mixing can be influenced by a dynamic bias termthat is a real-time estimate of a variance at the pixel. The weightingcan be determined by a relative stability of an observed value comparedto a stability of the time-average and an amount of the mixing is basedon a degree of change observed at the pixel. The spatially-varianttemporal smoothing can be achieved by associating one or moreindependent Kalman filters with each pixel position.

A more complete understanding of the present invention, as well asfurther features and advantages of the present invention, will beobtained by reference to the following detailed description anddrawings.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a schematic block diagram of an image correction systemincorporating features of the present invention;

FIG. 2 is a flow chart describing an exemplary implementation of ajitter correction (stabilization) method that may be employed by thejitter correction preprocessor of FIG. 1;

FIG. 3 is a flow chart describing an exemplary implementation of a colorcorrection method that may be employed by the color correctionpreprocessor of FIG. 1;

FIG. 4 is a flow chart describing an exemplary implementation of an NTSCcorrection process that may be employed by the NTSC color correctionpreprocessor of FIG. 1;

FIG. 5 is a flow chart describing an exemplary implementation of atemporal smoothing process that may be employed by the temporalsmoothing preprocessor of FIG. 1;

FIG. 6 is a flow chart describing an exemplary implementation of a lensnormalization process that may be employed by the lens normalizationpreprocessor of FIG. 1; and

FIG. 7 is a flow chart describing an exemplary implementation of ashadow removal process 700 that may be employed by the shadow removalpreprocessor 600 of FIG. 1.

DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS

The present invention provides methods and apparatus for visualbackground subtraction with one or more preprocessing modules. An inputvideo stream is passed through one or more switchable, reconfigurableimage correction units before being sent on to a background subtractionmodule or another visual analysis system. Depending on the environmentalconditions, one or more modules can be selectively switched on or offfor various camera feeds. For instance, an indoor camera generally doesnot require wind correction. In addition, for a single camera, variouspreprocessors might only be invoked at certain times. For example, atnight, the color response of most cameras is poor in which case theyrevert to essentially monochrome images. Thus, during the day, thesignal from this camera might be processed to ameliorate the effect ofchroma filtering (e.g., moving rainbow stripes at sharp edges) yet thismodule could be disabled at night.

The present invention copes with each of the problems identified abovethrough the addition of software preprocessing modules that are easy toinstall and have small incremental costs (no new hardware is involved).This architecture allows the use of a straightforward technique ofbackground subtraction, in conjunction with small, efficientpreprocessing engines crafted for known shortcomings, rather thanrequiring the use of more elaborate (and often slower) general-purposetechniques like optical flow analysis. The present invention recognizesthat even if a more sophisticated object detection technology is used toaccount for residual anomalies, algorithmically correcting thedegradation processes which are understood and known to occur typicallyreduces the burden on the scene modeling component and can improve theoverall system response time.

FIG. 1 is a schematic block diagram of an image correction system 100incorporating features of the present invention. As discussed furtherbelow, the image correction system 100 performs visual backgroundsubtraction at stage 195, for example, to detect one or more objects inan image, and employs one or more preprocessing modules 200, 300, 400,500, 600, 700, each discussed below in conjunction with FIGS. 2 through7, respectively. The processed image may be obtained, for example, froma remote camera 110, and the images generally have undergone an imagecompression 120. The compressed image may be received, for example, overa transmission medium or channel 125, such as a wired or wireless link.

As shown in FIG. 1, the settings of each of the preprocessing modules200, 300, 400, 500, 600, 700 may optionally be adjusted by an associatedadjustment tap 134, 144, 154, 164, 174, for example, to configure eachpreprocessor 200, 300, 400, 500, 600, 700 with a custom set ofparameters. In addition, in one implementation, each of thepreprocessing modules 200, 300, 400, 00, 600, 700 may be selectivelyincluded or excluded from the image processing path by an associatedswitch 138, 148, 158, 168, 178. The parameters and switch settings canbe different for different camera channels, and can vary over time insome scheduled or other requested manner.

It is noted an image signal received by the image correction system 100may have undergone previous corruption by one or more effects. The imagecorrection system 100 can optionally initially detect the one or moreeffects in the received image signal. This might be done, for example,by having a human evaluate the image. In another variation, eachpreprocessor 200, 300, 400, 500, 600, 700 can be applied to the image tosee if one or more of the preprocessors 200, 300, 400, 500, 600, 700reduces the number of objects detected by the final backgroundsubtraction system 195. Since these are presumably false positives, suchreductions indicate that the associated preprocessor should be enabled.Of course, the system could also use explicit detectors for the one ormore effects. Such detectors are inherent in many of the correctionblocks, i.e., if the relevant effect is absent, no correction isapplied, as would be apparent to a person of ordinary skill in the art.

Stabilization Method

FIG. 2 is a flow chart describing an exemplary implementation of ajitter correction (stabilization) method 200 that may be employed by thejitter correction preprocessor 200 of FIG. 1. Generally, the proposedimage stabilization subsystem 200 takes the foreground image at somespatial resolution and generates a number of alternative images based onshifting the image an integral number of pixels in the horizontal and/orvertical direction. These alternative images are compared to thereference image and a matching score is computed for each. Based on thisset of scores, a best guess at a floating point sub-pixel offset isdetermined that aligns the current image with the reference image. Thisoffset may be applied to normalize the current image (by shifting andinterpolation) at the analyzed resolution, or at either higher or lowerspatial resolutions by appropriate linear scaling of the offsetparameters.

The illustrative embodiment of the stabilization subsystem 200 isstrictly software, so it can be employed in legacy cameras and does notrequire the installation of new or bulkier hardware. The stabilizationsubsystem 200 is more efficient than optical flow methods, especiallysince the image can be analyzed at a lower resolution than standard, andthus requires less computational resources. Finally, the stabilizationsubsystem 200 generates sub-pixel estimates that permit the degree ofcorrection required by the background subtraction algorithm.

Each incoming video frame is shifted in various ways and then comparedto a stored reference image. In one preferred embodiment, the image isconverted to monochrome by averaging the red, green, and blue channels(as is the reference image). Images are analyzed at their standardresolution, but the comparisons are only made at a selection of samplesites, typically evenly spaced to yield several thousand sites (e.g. asampling unit of every 4th pixel in the horizontal and verticaldirection). This allows fine scale detail to be used in the estimationprocedure, but significantly reduces the computational demand. Also,because in many situations there is more pan than tilt, a cross-shapedsearch pattern is employed (rather than a full, and slower, search ofall offsets within a set of ranges).

As shown in FIG. 1, a foreground image at a given spatial resolution isobtained during step 210. Thereafter, the jitter correction method 200generates a plurality of alternative images during step 220, based onshifting the image an integral number of pixels in the horizontal and/orvertical direction. The alternative images are compared to a referenceimage during step 230, and a matching score is computed for eachalternative image.

During step 240, a globally best integral offset is determined thataligns the foreground image with the reference image. Finally, afloating point sub-pixel offset is computed during step 250 that is usedto generate a better aligned version of the current image.

In one exemplary implementation, a series of horizontal shifts in anexemplary range of ±4 pixels (including zero) is performed and eachresulting variation compared with the reference image (at the sameresolution and in monochrome). The comparison metric is the averageabsolute difference between corresponding selected pixel sites. Theshift with the best score (least difference) is chosen and the scores ofadjacent shifts (±1 pixel) are graphed and fit with a parabola. Thelowest point on this parabola is then taken as the floating pointsub-pixel horizontal shift. After this, the image is shifted by the bestinteger horizontal shift, as determined above, and then subjected to aseries of additional vertical shifts in a range of typically ±2 pixels(including the zero case, that was already scored). As before, the meanabsolute difference between each variation and the reference image iscomputed, the best integer shift is selected, and a floating pointsub-pixel vertical estimate is formed by a parabolic fit of adjacentscores. If an estimated shift is close to the bounds of the searchranges, it is declared invalid and a shift of (0.0, 0.0) is reportedinstead.

Because the jitter correction method 200 was intended for use inconjunction with a background subtraction system 195, it is easy toobtain a mask designating where (in the previous frame) foregroundobjects were detected relative to the stored background image. Sampledpixels that fall under this mask are omitted from the mean absolutedifference calculation. This keeps the estimator from attempting totrack large foreground objects (presuming they are detected) instead ofthe background as intended.

If a valid, significantly small shift is estimated (typically less than1/10 pixel) and it has been a long time since the reference image wasinitialized (typically 100 frames), then the background reference imageis updated by simply copying the current frame. If the current image isknown to be an invalid background model in certain regions due to thepresence of foreground objects, a separate background validity image isalso stored corresponding to the current mask. The system 200 thenignores pixel samples that fall under either this mask, or the newlysupplied mask for each input frame, as explained above.

The final stabilized image is generated by bi-linear interpolation ofthe original image at the best estimated offset. Since the input andoutput images are the same size, the mixing coefficients to generateeach pixel from its four nearest neighbors are always the same.Moreover, since there are only a discrete number of possibilities forintensity (0 to 255), it is possible to pre-compute four tables thatyield the appropriate scaled responses for each of the four neighbors.To convert the image, an integer pixel offset can be added to the readpointer and then the values of four neighbors are used as indices to thepre-computed tables and the lookup values summed to produce the desiredoutput pixel value. The same procedure and tables can be used for eachof the red, green, and blue channels in a color image.

Color Correction Method

FIG. 3 is a flow chart describing an exemplary implementation of a colorcorrection method 300 that may be employed by the color correctionpreprocessor 300 of FIG. 1. Generally, the image correction subsystem300 operates by estimating a multiplicative channel gain(s) to accountfor differences between the current video frame (image) and a storedreference frame (image). For a monochrome source, one gain value isestimated. For a color video, typically three channel gains (red, green,and blue) are estimated or, alternatively, one overall gain (as formonochrome) and three differential channel gains (for RGB). Thesemultiplicative factors are then applied to each pixel in the currentframe to generate an image more similar in overall appearance to thestored reference frame.

As shown in FIG. 3, a foreground image is initially obtained at aspatial resolution during step 310. The foreground image is thencompared to a stored reference frame on a pixel-by-pixel basis duringstep 320. The overall multiplicative gain(s) for each color channel areestimated during step 330 to account for differences between theforeground image and the stored reference frame. Finally, themultiplicative factors are applied to each pixel in the current frameduring step 340 to generate a corrected image.

In one embodiment, the global gain estimates are computed usinghistograms of the individual gain estimates derived for each pixelposition. The peak (mode) of the smoothed histogram is chosen as theoptimal correction factor thereby making the system robust to moderatescene changes (which give rise to secondary peaks without moving theprimary peak). In an alternative histogram stretching method, theintroduction of such a disturbance would lead to an inappropriatestretching of the normalization transfer function over the color regionrelated to the scene change.

In an exemplary implementation, there is a stored reference image B(x,y) and the current image V(x, y). Conceptually, for each pixel position(x, y) a factor f(x, y)=B(x, y)/V(x, y) is computed. These individualestimates are collected into a histogram H(f) over a range of possiblecorrection values, where H(f) may be optionally smoothed by an operationsuch as averaging of adjacent bins. Finally, the index f′ of the bin inH(f) with the maximum value is selected as the best gain correctionfactor. A new image V′(x, y)=f*V(x, y) is then generated as a result ofthe process.

In one preferred embodiment of the color correction system 300, forreasons of speed, only some fraction of the pixels in B and V areexamined. Typically, several thousand pixel sites (randomly orsystematically distributed) are sufficient to generate a valid gainestimate for images of arbitrarily large sizes. Also, in the preferredembodiment the gain estimates f′(t) are smoothed over time withsomething like a recursive filter, f″(t)=a*f(t)+(1−a)*f″(t), to accountfor the slowly varying nature of typical AGC/AWB circuitry. It is thissmoothed value, f″(t), that is used to correct the pixel intensities.

Estimates from some of the selected pixel sites can optionally bedisregarded. Since the correction method is used in conjunction with abackground subtraction object finding system, any pixels correspondingto known foreground objects (as determined from the previous frame) areomitted from the histogram. Similarly, pixels with intensities that areeither very high (e.g., saturated) or low (e.g., nearly black) areomitted because these estimates tend to be noisier than others. If toofew pixels remain (in any one of the channels), the overall gainestimate calculations are terminated and the gain factor most recentlyderived is re-used.

For color images, three separate channel gains r″(t), g″(t), b″(t) aregenerated and maintained in the exemplary color correction method 300described above. However, they are reported as an overall gain y(t) anddifferential gains dr(t), dg(t), db(t). The overall gain y(t) is derivedby taking the average of the three channel gains, and then clipping thevalue to a known valid range of gains. The individual channeldifferential gains are then computed relative to this overall gain(e.g., dr(t)=r″(t)/y(t)) and similarly clipped to a different knownvalid range of values. This prevents unreasonable compensationparameters from being used in the correction phase (i.e., Vr′(x, y,t)=y(t)*dr(t)*V(x, y, t)).

For use with background subtraction, the gains are also used to alter areference background image which will be compared with the newlycorrected video frame. In this operation, the value of a pixel in somechannel is limited to be less than or equal to 255 (the maximum pixelvalue) times the gain for that channel. The rationale for this is that,for a gain <1.0, this value is the largest value that could be generatedfor that color channel in the newly corrected frame. This prevents thesystem from flagging differences at pixels that are bright in thecurrent image but which could not be properly down corrected (sincetheir true value was unknown).

NTSC Artifact Reduction Method

FIG. 4 is a flow chart describing an exemplary implementation of an NTSCcorrection process 400 that may be employed by the NTSC color correctionpreprocessor 400 of FIG. 1. Generally, the NTSC correction subsystem 400suppresses color information around problematic edges. The NTSCcorrection subsystem 400 initially directly finds sharp verticaltransitions in the corrupted image, then generates a soft mask aroundthese areas that is used to gradually blend in a monochrome version ofthe image. This method 400 allows color information to be retained forthe bulk of the image while simultaneously minimizing the effect ofrainbow artifacts. As no spatially averaging is involved, the resultingimage retains the full resolution of the original. Moreover, the methodworks equally well on native or compressed video streams.

As shown in FIG. 4, the NTSC correction process 400 initially prepares amonochrome version of the corrupted image during step 410. Thereafter,sharp vertical intensity transitions are identified in the corruptedimage during step 420. A soft mask is generated around these areasduring step 430, and then the soft mask is used during step 440 togradually blend in the monochrome version of image with the corruptedimage to generate a corrected image.

In one preferred embodiment, step 410 is performed by averaging the red,green, and blue color channels at each pixel. Step 420 can be done byapplying horizontal and vertical Sobel masks and analyzing theconvolution responses. Various edge pixels are then selected based ontheir directions to form an intermediate mask image. In one preferredembodiment, edge pixels with orientations of +45 to +135 degrees or +225to +315 degrees have their magnitudes multiplied by a factor of 10 andgated to the intermediate image (so a true edge magnitude of 25.5 orhigher will yield a fully white, 255, mask pixel). All other pixels areset to zero.

The intermediate mask image is then processed by a local averagingoperator (essentially a convolution with a uniform-value rectangularblock mask). This spreads the influence of the detected edge to adjacentpixels and also “feathers” the edges of the mask. In one preferredembodiment, the local averaging occurs over a 5×3 pixel area centered onthe original pixel and multiplies the resulting value by a factor of 3.

In the final step 440, the blurred mask (M) is used to mix togetherpixels of the original image (I) with pixels from a monochrome version(G) of the image. This monochrome image can be the same as the one usedfor edge finding, but conceptually it could be formed by some differentprocess (e.g., an unevenly weighted combination of red, green, and bluevalues).I′c(x, y)=[1−(M(x, y)/255)]*Ic(x, y)+[M(x, y)/255]*G(x, y)

This is the final output (I′) of the process 400, where c is a colorchannel, such as the red component of some pixel.

Temporal Smoothing Method

FIG. 5 is a flow chart describing an exemplary implementation of atemporal smoothing process 500 that may be employed by the temporalsmoothing preprocessor 500 of FIG. 1. Generally, the temporal smoothingsubsystem 500 attempts to preserve all the spatial detail in an imagewhile suppressing time-varying noise. The basic structure of theprocessing engine is a set of independent Kalman filters, conceptuallyone at each pixel position. New values of intensity at a pixel are mixedin with the previous time-average using a weighting determined by therelative stability of the observed value versus the stability of thetime-average. However, instead of just assuming a fixed noise figure foreach observation and a monotonically decreasing noise figure for theaverage, the noise figure for the average is directly tweaked on eachcycle in the exemplary embodiment based on the current observation.

According to one aspect of the invention, the disclosed temporalsmoothing process 500 applies temporal smoothing to all pixels in animage, but the amount of smoothing depends on how much change has beenobserved at that pixel. In areas where motion is detected, the temporalsmoothing is basically turned off, yet it is reapplied once the regionsettles down. While this does not suppress noise to the same extent asstraight temporal averaging would, it is much more responsive to movingobjects. Moreover, the degree of motion responsiveness is smoothlycontrolled by a continuous variable rather than having the system makefirm decisions on motion or no-motion. This preserves object boundariesbetter (especially where they are somewhat indistinct) and acts toconceal any slight mistakes the system might make in its classification.

As shown in FIG. 5, the temporal smoothing process 500 initially findspixel-wise differences between the current image and the previoussmoothed image during step 510. Thereafter, during step 520, thetemporal smoothing process 500 computes a pixel-wise stability estimatebased on previous pixel variances and current differences.

A pixel-wise weighting factor is generated during step 530 based on thestability estimates and a channel noise estimate. A new smoothed imageis generated during step 540 by mixing in the current image using thepixel-wise weighting factors. Finally, a new pixel variance is generatedduring step 550 using the weighting factors and the pixel-wise stabilityestimates.

A Kalman filter can be described with two equations: one for themeasurement (M) and one for the process (P).measurement: M=P+Vm,where Vm equals the variance in the measurementprocess: P′=b*P+c,where c is the expected jumpiness and b is a time decay constant.

These equations can be used by the temporal smoothing process 500 togenerate the standard Kalman update equations. The mixing of the newobservation (M) with the previous average (P) during step 540 isdetermined by the Kalman matrix (here, just the value k). Afterabsorbing the new measurement, the system 500 retains the new estimatesof the average (P′) and the variance (V′) for use on the next cycle.d=M−P,where M equals the current image.k=V/(V+n),where n equals the measurement noise constant.P′=P+k*d,where P′ equals the new average.V′=V−k*Vwhere V′ equals the new variance.

In an exemplary embodiment, instead of using just the computed varianceof the estimate to construct the Kalman mixing factor, a dynamic biasterm is also included that is a real-time, one sample estimate of thevariance at the pixel:d=M−P,where M equals the current image.s=V+f*(d*d−V),where f equals the mixing constant.k=s/(s+n),where n equals the measurement noise constant.P′=P+k*d,where P′ equals the new average.V′=s−k*s,where V′ equals the new variance.

Note that d*d=(M−P)ˆ2 equals the square of the difference between thecurrent observation and the longer term average. It is this new “s” termthat causes the temporal averaging to be turned off when objects move.If the pixel is much different from what is expected, s goes up which inturn raises k, the proportion by which the new measurement is blendedwith the longer term average.

In one preferred embodiment for video at 30 frames per second and forpixel intensities in the range of 0 to 255, f equals 0.1 and n equals64. Also, for color images, separate versions of the estimator are runfor the red, green, and blue values at each pixel. The “clean” image isgenerated by reporting the averages (P′) for each estimator in place ofthe original observed intensities (M).

Lens Normalization

FIG. 6 is a flow chart describing an exemplary implementation of a lensnormalization process 600 that may be employed by the lens normalizationpreprocessor 600 of FIG. 1. As shown in FIG. 6, the corrected image issplit into a collection of independent pixel positions during step 610.Thereafter, a radial distortion correction equation is used to determinethe fractional pixel position in the input image closest to the sourcefor the corrected pixel during step 620.

Standard radial lens distortion correction can be accomplished byapplying the following equations:x′=x+sc2*r ² +sc4*r ⁴; andy′=y+sc2*r ² +sc4*r ⁴where (x′, y′) is the new corrected position for a pixel, (x, y) is theoriginal pixel location in the distorted image, r is the distance of theoriginal pixel from the projection (x0, y0) of the optical lens centeron the image plane, and sc2 and sc4 are fixed constants describing thecurvature of the lens.

Interpolation is employed on values of the input image pixels closest tothe fractional pixel position to generate a value for the correctedpixel during step 630. Finally, all independently interpolated pixelsare recombined during step 640 to generate a corrected output image.

Shadow Removal Method

FIG. 7 is a flow chart describing an exemplary implementation of ashadow removal process 700 that may be employed by the shadow removalpreprocessor 700 of FIG. 1. Generally, the shadow removal subsystem 700pre-corrects the input image for possible shadow effects before passingthe corrected image on to a standard background subtraction algorithm.The pre-correction involves adjusting the intensity of each pixel basedon a gain factor computed from channel intensity ratios that have beenweighted by channel noise estimates. In regions where there is littledifference between the original input and the reference, the gainestimate will be close to one and hence there will continue to be littledifference. In regions where there are significant differences(particularly in color), correcting the average intensity of a pixelwill not generally make its color components match any better and hencethere will still be a difference. It is only where absolute intensitycorrection is appropriate (namely, shadows and highlights) that pixelswill have their determination changed (i.e., from being a difference, tonot being a difference).

The disclosed shadow removal process 700 has a number of advantages. Theshadow removal process 700 does not require expensive trigonometriccalculations (and hence can be faster), will work in dim and blandregions (since it remains in the RGB color space), and will not bethrown off significantly by noisy images or video compression artifacts(e.g. a bad blue channel).

As shown in FIG. 7, a pixel-wise ratio between the current image and areference image is determined for each color channel during step 710.The ratios are then combined during step 720 at each pixel usingestimates of the relative noise in each color channel. The shadowremoval process 700 then divides the value of each color channel by thecombined ratio estimate at each pixel during step 730.

In one exemplary embodiment of the shadow removal process 700, for eachpixel in the input image, its red, green, and blue color values arecompared to those for the corresponding pixel in the reference image.Separate ratios are computed for each channel:F _(r) =S _(r) /I _(r) , F _(g) =S _(g) /I _(g) , F _(b) =S _(b) /I_(b).Here, F_(c) is the gain correction factor estimate for channel c (eitherr=red, g=green, or b=blue), S_(c) is the value of channel c for thepixel in the stable reference image, and I_(c) is the value of the pixelin channel c for the input image. The three separate estimates are theneach compared to a potential valid range of correction, such as 2.0× to0.8×. If any individual estimate is outside these bounds, the gain forthe pixel is set to one (and so no change is made). Otherwise, theindividual estimates are combined based on the noise in each channel:F=F _(r) /W _(r) +F _(g) /W _(g) +F _(b) /W _(b),where W_(c)=N_(c)*(1/N_(r)+1/N_(g)+1/N_(b)), N_(c) being the observednoise in channel c. Once F(x, y) has been calculated for each pixel, acorrected image is produced by multiplying through by the derivedfactors:I′(x, y)=F(x, y)*I(x, y).

In one preferred embodiment, the noise estimates are computed bycomparing each input image with a reference image. Typically, theoriginal image is heavily subsampled (e.g., every 4th pixel in thevertical and horizontal directions) to select only several thousandpixels for evaluation. Also, since this module 700 is typically used inconjunction with a background subtraction system 195, pixels that areknown not to correspond to the background (i.e., pixels that are part ofdetected foreground objects) are omitted from the comparison. Theabsolute value of each selected pixel difference (|I_(c)(x, y)−S_(c)(x,y)| in a channel c) is then accumulated into a difference histogram forthat channel.

The difference histogram itself is smoothed, using a method such as theaveraging of adjacent bins, and the primary peak (maximum occupancy bin)is found. The falling edge of this peak is determined by locating thelowest index bin whose occupancy is less than some factor (e.g., 10%) ofthe peak value. The value (n) associated with this bin is a new estimateof the noise in the channel. This new value can either be reporteddirectly or, in the preferred implementation, combined with the previousnoise estimate using a temporal smoothing filter (e.g.,N′_(c)=(1−k)*N_(c)+k*n with k equal to 0.05 for 30 frames per secondvideo).

In a further variation, one or more of the preprocessing blocks in thesystem 100 can perform a contrast enhancement on the image signal.Contrast enhancement can be implemented, for example, by determiningwhat part of the dynamic range of pixel values is being used. In oneimplementation, a histogram is created of all the red, green, and bluepixel values and then the 5% percentile point and the 95% percentilepoint of the distribution are identified. From these numbers, an offsetand scale factor are calculated that will translate these points tofixed values such as 20 and 240, respectively. This effectivelystretches the range of values being used without altering the hueinformation (which is based on color differences, not ratios).

System and Article of Manufacture Details

As is known in the art, the methods and apparatus discussed herein maybe distributed as an article of manufacture that itself comprises acomputer readable medium having computer readable code means embodiedthereon. The computer readable program code means is operable, inconjunction with a computer system, to carry out all or some of thesteps to perform the methods or create the apparatuses discussed herein.The computer readable medium may be a recordable medium (e.g., floppydisks, hard drives, compact disks, or memory cards) or may be atransmission medium (e.g., a network comprising fiber-optics, theworld-wide web, cables, or a wireless channel using time-divisionmultiple access, code-division multiple access, or other radio-frequencychannel). Any medium known or developed that can store informationsuitable for use with a computer system may be used. Thecomputer-readable code means is any mechanism for allowing a computer toread instructions and data, such as magnetic variations on a magneticmedia or height variations on the surface of a compact disk.

The computer systems and servers described herein each contain a memorythat will configure associated processors to implement the methods,steps, and functions disclosed herein. The memories could be distributedor local and the processors could be distributed or singular. Thememories could be implemented as an electrical, magnetic or opticalmemory, or any combination of these or other types of storage devices.Moreover, the term “memory” should be construed broadly enough toencompass any information able to be read from or written to an addressin the addressable space accessed by an associated processor. With thisdefinition, information on a network is still within a memory becausethe associated processor can retrieve the information from the network.

It is to be understood that the embodiments and variations shown anddescribed herein are merely illustrative of the principles of thisinvention and that various modifications may be implemented by thoseskilled in the art without departing from the scope and spirit of theinvention.

1. A method for processing an image signal, comprising: receiving animage signal that has been corrupted by one or more effects; detectingsaid one or more effects in said received image signal; selectivelyenabling one or more blocks to preprocess said image signal tocompensate for said detected one or more effects; and performing visualanalysis on said preprocessed signal using background subtraction. 2.The method of claim 1, wherein said visual analysis identifies one ormore objects in said preprocessed image signal.
 3. The method of claim1, wherein said one or more blocks performs a jitter correction on saidimage signal.
 4. The method of claim 1, wherein said one or more blocksperforms a color correction on said image signal.
 5. The method of claim1, wherein said one or more blocks performs a contrast enhancement onsaid image signal.
 6. The method of claim 1, wherein said one or moreblocks performs a cable-induced visual artifact reduction on said imagesignal.
 7. The method of claim 1, wherein said one or more blocksperforms a spatially-variant temporal smoothing on said image signal. 8.The method of claim 1, wherein said one or more blocks performs a lensgeometry normalization on said image signal.
 9. A system for processingan image signal, comprising: a memory; and at least one processor,coupled to the memory, operative to: receive an image signal that hasbeen corrupted by one or more effects; detect said one or more effectsin said received image signal; selectively enable one or more blocks topreprocess said image signal to compensate for said detected one or moreeffects; and perform visual analysis on said preprocessed signal usingbackground subtraction.
 10. The system of claim 9, wherein said visualanalysis identifies one or more objects in said preprocessed imagesignal.
 11. The system of claim 9, wherein said one or more blocksperforms a jitter correction on said image signal.
 12. The system ofclaim 9, wherein said one or more blocks performs a color correction onsaid image signal.
 13. The system of claim 9, wherein said one or moreblocks performs a contrast enhancement on said image signal.
 14. Thesystem of claim 9, wherein said one or more blocks performs acable-induced visual artifact reduction on said image signal.
 15. Thesystem of claim 9, wherein said one or more blocks performs aspatially-variant temporal smoothing on said image signal.
 16. Thesystem of claim 9, wherein said one or more blocks performs a lensgeometry normalization on said image signal.
 17. An article ofmanufacture for processing an image signal, comprising a machinereadable medium containing one or more programs which when executedimplement the steps of: receiving an image signal that has beencorrupted by one or more effects; detecting said one or more effects insaid received image signal; selectively enabling one or more blocks topreprocess said image signal to compensate for said detected one or moreeffects; and performing visual analysis on said preprocessed signalusing background subtraction.
 18. The article of manufacture of claim17, wherein said one or more blocks performs one or more of a jittercorrection on said image signal, a color correction on said imagesignal, a contrast enhancement on said image signal, a cable-inducedvisual artifact reduction on said image signal, a spatially-varianttemporal smoothing on said image signal, or a lens geometrynormalization on said image signal.
 19. A method for processing an imagesignal, comprising: receiving an image signal that has been corrupted byone or more effects; selectively enabling one or more blocks topreprocess said image signal to compensate for said one or more effects;performing spatially-variant temporal smoothing to further preprocesssaid image signal; and presenting said preprocessed image signal forvisual analysis.
 20. The method of claim 19, wherein said visualanalysis identifies one or more objects in said preprocessed imagesignal.
 21. The method of claim 19, wherein said visual analysis usesbackground subtraction.
 22. The method of claim 19, wherein said visualanalysis is performed by a human watching a video screen.
 23. The methodof claim 19, wherein said spatially-variant temporal smoothing isachieved by the mixing of a new intensity value with a previousintensity time-average as determined by a weighting matrix.
 24. Themethod of claim 23, wherein said mixing is influenced by a dynamic biasterm that is a real-time estimate of a variance at said pixel.
 25. Themethod of claim 23, wherein said weighting is determined by a relativestability of an observed value compared to a stability of thetime-average.
 26. The method of claim 23, wherein an amount of saidmixing is based on a degree of change observed at said pixel.
 27. Themethod of claim 23, wherein an amount of said mixing is reduced as adegree of motion at said pixel increases.
 28. The method of claim 19,wherein said spatially-variant temporal smoothing is achieved byassociating one or more independent Kalman filters with each pixelposition.